Context Models for OOV Word Translation in Low-Resource Languages
نویسندگان
چکیده
Out-of-vocabulary word translation is a major problem for the translation of low-resource languages that suffer from a lack of parallel training data. This paper evaluates the contributions of target-language context models towards the translation of OOV words, specifically in those cases where OOV translations are derived from external knowledge sources, such as dictionaries. We develop both neural and non-neural context models and evaluate them within both phrase-based and self-attention based neural machine translation systems. Our results show that neural language models that integrate additional context beyond the current sentence are the most effective in disambiguating possible OOV word translations. We present an efficient second-pass lattice-rescoring method for wide-context neural language models and demonstrate performance improvements over state-of-the-art self-attention based neural MT systems in five out of six low-resource language pairs.
منابع مشابه
Paraphrasing Out-of-Vocabulary Words with Word Embeddings and Semantic Lexicons for Low Resource Statistical Machine Translation
Out-of-vocabulary (OOV) word is a crucial problem in statistical machine translation (SMT) with low resources. OOV paraphrasing that augments the translation model for the OOV words by using the translation knowledge of their paraphrases has been proposed to address the OOV problem. In this paper, we propose using word embeddings and semantic lexicons for OOV paraphrasing. Experiments conducted...
متن کاملTranslation of Unknown Words in Low Resource Languages
We address the problem of unknown words, also known as out of vocabulary (OOV) words, in machine translation of low resource languages. Our technique comprises a combination of methods, inspired by the common OOV types observed. We also design evaluation techniques for measuring coverage of OOVs achieved and integrate the new translation candidates in a Statistical Machine Translation (SMT) sys...
متن کاملWord Transduction for Addressing the OOV Problem in Machine Translation for Similar Resource-Scarce Languages
Similar languages have a large number of cognate words which can be exploited to deal with Out-Of-Vocabulary (OOV) words problem. This problem is especially severe for resource-scarce languages. We propose a method for ‘word transduction’ for addressing this problem. We take advantage of the fact that, although it is difficult to prepare sentence aligned parallel corpus for such languages, it i...
متن کاملSublexical Translations for Low-Resource Language
Machine Translation (MT) for low-resource language has low-coverage issues due to Out-OfVocabulary (OOV) Words. In this research we propose a method using sublexical translation to achieve wide-coverage in Example-Based Machine Translation (EBMT) for English to Bangla language. For sublexical translation we divide the OOV words into sublexical units for getting translation candidates. Previous ...
متن کاملMonolingual Distributional Profiles for Word Substitution in Machine Translation
Out-of-vocabulary (OOV) words present a significant challenge for Machine Translation. For low-resource languages, limited training data further increases the frequency of OOV words and degrades the quality of the translations. Past approaches have suggested using stems or synonyms for OOV words. Unlike the previous methods, we propose handling not just the OOV words but rare words as well in a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1801.08660 شماره
صفحات -
تاریخ انتشار 2018